Effectiveness and Limitations of Statistical Spam Filters
نویسندگان
چکیده
Spam is not only clogging the Internet traffic by consuming a hefty amount of network bandwidth but it is also a source for e-mail born viruses, spyware, adware and Trojan Horses. It is also used to carry out denial of service, directory harvesting and phishing attacks that directly cause financial losses. Further, the contents of spam are often offensive and contain adult oriented and fraudulent materials which are objectionable to recipients. Several anti-spam procedures are currently employed to distinguish spam from legitimate e-mails; however spammers and phishers employ dynamic spam structures to obfuscate email content to circumvent these procedures. Apart from other technological procedures various adaptive learning filters have been developed that have an ability to allow an algorithm to constantly learn what sort of e-mail’s or e-mail content a recipient would typically process and what to see in normal course of its business. These filters are based on complex statistical techniques that classify future e-mails based on the word content of accepted e-mails. The statistical techniques employed in these filters separate an incoming e-mail into tokens and assign a probability value to each token. The probability of each token are collectively used to calculate the overall spam probability and accordingly the incoming e-mail is scored as spam, probably spam or legitimate e-mail.
منابع مشابه
A New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection
Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...
متن کاملGood Word Attacks on Statistical Spam Filters
Unsolicited commercial email is a significant problem for users and providers of email services. While statistical spam filters have proven useful, senders of spam are learning to bypass these filters by systematically modifying their email messages. In a good word attack, one of the most common techniques, a spammer modifies a spam message by inserting or appending words indicative of legitima...
متن کاملMachine Learning for Naive Bayesian Spam Filter Tokenization
Background Traditional client level spam filters rely on rule based heuristics. While these filters can be effective they have several limitations. The rules must be created by hand. This requires the filter creator to examine a corpus of spam and cull out characteristics. This is a time consuming process and it is easy to miss rules which are quite effective at detecting spam. While the word ”...
متن کاملTraining SpamAssassin with Active Semi-supervised Learning
Most spam filters include some automatic pattern classifiers based on machine learning and pattern recognition techniques. Such classifiers often require a large training set of labeled emails to attain a good discriminant capability between spam and legitimate emails. In addition, they must be frequently updated because of the changes introduced by spammers to their emails to evade spam filter...
متن کاملar X iv : c s . C R / 0 40 20 46 v 1 1 9 Fe b 20 04 SPAM FILTER ANALYSIS
Unsolicited bulk email (aka. spam) is a major problem on the Internet. To counter spam, several techniques, ranging from spam filters to mail protocol extensions like hashcash, have been proposed. In this paper we investigate the effectiveness of several spam filtering techniques and technologies. Our analysis was performed by simulating email traffic under different conditions. We show that ge...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/0910.2540 شماره
صفحات -
تاریخ انتشار 2009